AITopics | Heilongjiang Province

Visual Grounding aims to localize the referring object in an image given a natural language expression. Recent advancements in DETR-based visual grounding methods have attracted considerable attention, as they directly predict the coordinates of the target object without relying on additional efforts, such as pre-generated proposal candidates or pre-defined anchor boxes. However, existing research primarily focuses on designing stronger multi-modal decoder, which typically generates learnable queries by random initialization or by using linguistic embeddings. This vanilla query generation approach inevitably increases the learning difficulty for the model, as it does not involve any target-related information at the beginning of decoding. Furthermore, they only use the deepest image feature during the query learning process, overlooking the importance of features from other levels.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

Asia > China > Heilongjiang Province (0.14)
Asia > China > Shaanxi Province (0.14)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

MoK-RAG: Mixture of Knowledge Paths Enhanced Retrieval-Augmented Generation for Embodied AI Environments

Guo, Zhengsheng, Zheng, Linwei, Chen, Xinyang, Bai, Xuefeng, Chen, Kehai, Zhang, Min

arXiv.org Artificial IntelligenceMar-18-2025

While human cognition inherently retrieves information from diverse and specialized knowledge sources during decision-making processes, current Retrieval-Augmented Generation (RAG) systems typically operate through single-source knowledge retrieval, leading to a cognitive-algorithmic discrepancy. To bridge this gap, we introduce MoK-RAG, a novel multi-source RAG framework that implements a mixture of knowledge paths enhanced retrieval mechanism through functional partitioning of a large language model (LLM) corpus into distinct sections, enabling retrieval from multiple specialized knowledge paths. Applied to the generation of 3D simulated environments, our proposed MoK-RAG3D enhances this paradigm by partitioning 3D assets into distinct sections and organizing them based on a hierarchical knowledge tree structure. Different from previous methods that only use manual evaluation, we pioneered the introduction of automated evaluation methods for 3D scenes. Both automatic and human evaluations in our experiments demonstrate that MoK-RAG3D can assist Embodied AI agents in generating diverse scenes.

computational linguistic, mok-rag3d, proceedings, (14 more...)

arXiv.org Artificial Intelligence

2503.13882

Country:

North America > United States (0.28)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
Asia > China > Heilongjiang Province (0.14)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.85)

Add feedback

DistJoin: A Decoupled Join Cardinality Estimator based on Adaptive Neural Predicate Modulation

Zhang, Kaixin, Wang, Hongzhi, Li, Ziqi, Lu, Yabin, Li, Yingze, Yan, Yu, Guan, Yiming

arXiv.org Artificial IntelligenceMar-11-2025

Research on learned cardinality estimation has achieved significant progress in recent years. However, existing methods still face distinct challenges that hinder their practical deployment in production environments. We conceptualize these challenges as the "Trilemma of Cardinality Estimation", where learned cardinality estimation methods struggle to balance generality, accuracy, and updatability. To address these challenges, we introduce DistJoin, a join cardinality estimator based on efficient distribution prediction using multi-autoregressive models. Our contributions are threefold: (1) We propose a method for estimating both equi and non-equi join cardinality by leveraging the conditional probability distributions of individual tables in a decoupled manner. (2) To meet the requirements of efficient training and inference for DistJoin, we develop Adaptive Neural Predicate Modulation (ANPM), a high-throughput conditional probability distribution estimation model. (3) We formally analyze the variance of existing similar methods and demonstrate that such approaches suffer from variance accumulation issues. To mitigate this problem, DistJoin employs a selectivity-based approach rather than a count-based approach to infer join cardinality, effectively reducing variance. In summary, DistJoin not only represents the first data-driven method to effectively support both equi and non-equi joins but also demonstrates superior accuracy while enabling fast and flexible updates. We evaluate DistJoin on JOB-light and JOB-light-ranges, extending the evaluation to non-equi join conditions. The results demonstrate that our approach achieves the highest accuracy, robustness to data updates, generality, and comparable update and inference speed relative to existing methods.

distjoin, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2503.08994

Country: Asia > China > Heilongjiang Province (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Adaptive-LIO: Enhancing Robustness and Precision through Environmental Adaptation in LiDAR Inertial Odometry

Zhao, Chengwei, Hu, Kun, Xu, Jie, Zhao, Lijun, Han, Baiwen, Wu, Kaidi, Tian, Maoshan, Yuan, Shenghai

arXiv.org Artificial IntelligenceMar-6-2025

The emerging Internet of Things (IoT) applications, such as driverless cars, have a growing demand for high-precision positioning and navigation. Nowadays, LiDAR inertial odometry becomes increasingly prevalent in robotics and autonomous driving. However, many current SLAM systems lack sufficient adaptability to various scenarios. Challenges include decreased point cloud accuracy with longer frame intervals under the constant velocity assumption, coupling of erroneous IMU information when IMU saturation occurs, and decreased localization accuracy due to the use of fixed-resolution maps during indoor-outdoor scene transitions. To address these issues, we propose a loosely coupled adaptive LiDAR-Inertial-Odometry named \textbf{Adaptive-LIO}, which incorporates adaptive segmentation to enhance mapping accuracy, adapts motion modality through IMU saturation and fault detection, and adjusts map resolution adaptively using multi-resolution voxel maps based on the distance from the LiDAR center. Our proposed method has been tested in various challenging scenarios, demonstrating the effectiveness of the improvements we introduce. The code is open-source on GitHub: \href{https://github.com/chengwei0427/adaptive_lio}{Adaptive-LIO}.

adaptive-lio, artificial intelligence, segmentation, (17 more...)

arXiv.org Artificial Intelligence

2503.05077

Country:

Asia > China > Heilongjiang Province (0.14)
Asia > China > Zhejiang Province (0.14)
Asia > China > Sichuan Province (0.14)

Genre: Research Report (0.50)

Industry:

Transportation > Ground > Road (0.54)
Information Technology > Robotics & Automation (0.54)
Automobiles & Trucks (0.54)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)

Add feedback

Orchestrating Joint Offloading and Scheduling for Low-Latency Edge SLAM

Zhang, Yao, Mao, Yuyi, Wang, Hui, Yu, Zhiwen, Guo, Song, Zhang, Jun, Wang, Liang, Guo, Bin

arXiv.org Artificial IntelligenceFeb-27-2025

Achieving real-time SLAM on mobile robotic systems with limited computational resources is challenging because the complexity of SLAM algorithms increases over time. This restriction can be lifted by offloading computations to edge servers, forming the emerging paradigm of edge-assisted SLAM. Nevertheless, the exogenous and stochastic input processes affect the dynamics of the edge-assisted SLAM system. Moreover, the requirements of clients on SLAM metrics change over time, exerting implicit and time-varying effects on the system. In this paper, we aim to push the limit beyond existing edge-assist SLAM by proposing a new architecture that can handle the input-driven processes and also satisfy clients' implicit and time-varying requirements. The key innovations of our work involve a regional feature prediction method for importance-aware local data processing, a configuration adaptation policy that integrates data compression/decompression and task offloading, and an input-dependent learning framework for task scheduling with constraint satisfaction. Extensive experiments prove that our architecture improves pose estimation accuracy and saves up to 47% of communication costs compared with a popular edge-assisted SLAM system, as well as effectively satisfies the clients' requirements. Index Terms --Simultaneous localization and mapping (SLAM), mobile edge computing (MEC), task offloading, task scheduling, and constrained reinforcement learning.

data mining, machine learning, reinforcement learning, (22 more...)

arXiv.org Artificial Intelligence

2502.16495

Country:

Asia > China > Shaanxi Province (0.14)
Asia > China > Heilongjiang Province (0.14)

Genre: Research Report (1.00)

Industry: Information Technology > Software (0.48)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
(4 more...)

Add feedback

MTVHunter: Smart Contracts Vulnerability Detection Based on Multi-Teacher Knowledge Translation

Sun, Guokai, Zhuang, Yuan, Zhang, Shuo, Feng, Xiaoyu, Liu, Zhenguang, Zhang, Liguo

arXiv.org Artificial IntelligenceFeb-24-2025

Smart contracts, closely intertwined with cryptocurrency transactions, have sparked widespread concerns about considerable financial losses of security issues. To counteract this, a variety of tools have been developed to identify vulnerability in smart contract. However, they fail to overcome two challenges at the same time when faced with smart contract bytecode: (i) strong interference caused by enormous non-relevant instructions; (ii) missing semantics of bytecode due to incomplete data and control flow dependencies. In this paper, we propose a multi-teacher based bytecode vulnerability detection method, namely Multi-Teacher Vulnerability Hunter (MTVHunter), which delivers effective denoising and missing semantic to bytecode under multi-teacher guidance. Specifically, we first propose an instruction denoising teacher to eliminate noise interference by abstract vulnerability pattern and further reflect in contract embeddings. Secondly, we design a novel semantic complementary teacher with neuron distillation, which effectively extracts necessary semantic from source code to replenish the bytecode. Particularly, the proposed neuron distillation accelerate this semantic filling by turning the knowledge transition into a regression task. We conduct experiments on 229,178 real-world smart contracts that concerns four types of common vulnerabilities. Extensive experiments show MTVHunter achieves significantly performance gains over state-of-the-art approaches.

machine learning, natural language, vulnerability, (20 more...)

arXiv.org Artificial Intelligence

2502.16955

Country: Asia > China > Heilongjiang Province (0.14)

Genre: Research Report > Promising Solution (0.34)

Industry:

Banking & Finance > Economy (1.00)
Information Technology > Security & Privacy (0.88)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Security & Privacy (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.88)

Add feedback

Interpreting Operation Selection in Differentiable Architecture Search: A Perspective from Influence-Directed Explanations 4 1 Harbin Institute of Technology (Shenzhen)

Neural Information Processing SystemsFeb-12-2025, 01:19:39 GMT

The Differentiable ARchiTecture Search (DARTS) has dominated the neural architecture search community due to its search efficiency and simplicity. DARTS leverages continuous relaxation to convert the intractable operation selection problem into a continuous magnitude optimization problem which can be easily handled with gradient-descent, while it poses an additional challenge in measuring the operation importance or selecting an architecture from the optimized magnitudes. The vanilla DARTS assumes the optimized magnitudes reflect the importance of operations, while more recent works find this naive assumption leads to poor generalization and is without any theoretical guarantees. In this work, we leverage influence functions, the functional derivatives of the loss function, to theoretically reveal the operation selection part in DARTS and estimate the candidate operation importance by approximating its influence on the supernet with Taylor expansions. We show the operation strength is not only related to the magnitude but also secondorder information, leading to a fundamentally new criterion for operation selection in DARTS, named Influential Magnitude. Empirical studies across different tasks on several spaces show that vanilla DARTS and its variants can avoid most failures by leveraging the proposed theory-driven operation selection criterion.

artificial intelligence, international conference, machine learning, (15 more...)

Neural Information Processing Systems

Country:

Asia > China > Heilongjiang Province > Harbin (0.40)
Asia > China > Guangdong Province > Shenzhen (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.90)

Add feedback

Zeroth-Order Negative Curvature Finding: Escaping Saddle Points without Gradients Nanjing University of Information Science & Technology Harbin Institute of Technology

Neural Information Processing SystemsFeb-10-2025, 21:44:16 GMT

We consider escaping saddle points of nonconvex problems where only the function evaluations can be accessed. Although a variety of works have been proposed, the majority of them require either second or first-order information, and only a few of them have exploited zeroth-order methods, particularly the technique of negative curvature finding with zeroth-order methods which has been proven to be the most efficient method for escaping saddle points. To fill this gap, in this paper, we propose two zeroth-order negative curvature finding frameworks that can replace Hessian-vector product computations without increasing the iteration complexity. We apply the proposed frameworks to ZO-GD, ZO-SGD, ZO-SCSG, ZO-SPIDER and prove that these ZO algorithms can converge to (ϵ, δ)-approximate secondorder stationary points with less query complexity compared with prior zeroth-order works for finding local minima.

machine learning, natural language, saddle point, (17 more...)

Neural Information Processing Systems

Country: Asia > China > Heilongjiang Province > Harbin (0.40)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.35)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.31)

Add feedback